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Abstract. We introduce a formalism for computing bond percolation properties 
of a class of correlated and clustered random graphs. This class of graphs is 
a generalization of the Configuration Model where nodes of different types are 
connected via different types of hyperedges, edges that can link more than 2 
nodes. We argue that the multitype approach coupled with the use of clustered 
hyperedges can reproduce a wide spectrum of complex patterns, and thus enhances 
our capability to model real complex networks. As an illustration of this claim, 
we use our formalism to highlight unusual behaviors of the size and composition 
of the components (small and giant) in a synthetic, albeit realistic, social network. 



1. Introduction 

Bond percolation is the study of the size distribution of components in graphs 
whose edges exist with a given probability. For its theoretical appeal and its varied 
applications in many contexts, mathematical modelling of bond percolation on random 
graphs has recently received substantial attention (see [1, 2], and references therein). 
Within the Configuration Model (CM) paradigm [3, 4], many exact results can be 
obtained using probability generating functions (PGF) [5]. This analytic tractability 
however comes at the price of simplifying assumptions on the structure of the graphs. 

We introduce a generalization of the CM that encompasses many of the previous 
improvements published to this day [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20], and brings this class of models closer to the behavior of real complex networks. 
By combining the multitype approach of [6], the analytical method of [21] and the 
one-mode projection of [16], we argue that our model is able to reproduce a wide 
range of complex patterns found in real networks. 

On the one hand, the multitype approach allows to explicitly prescribe how nodes 
are connected to one another in a very detailed fashion. By assigning types to nodes 

- in other words by knowing who is who, and therefore who is connected to whom 

- several mixing patterns (e.g., assortativity, degree correlation, node segregation), 
as well as heterogeneous bond occupation probabilities (e.g., partial and/or uneven 
directionnality of edges) can be reproduced. On the other hand, the use of the 
one-mode projection, coupled with the multitype approach, allows the inclusion of 
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clustering through a myriad of nontrivial motifs, i.e. recurrent, significant patterns of 
interconnections [22]. 

This paper is organized as follows. In section 2, we introduce the generalization of 
the CM that explicitly includes various correlations and clustering. We then develop 
the analytical framework to obtain the bond percolation properties of this graph 
ensemble in section 3. In section 4, we validate our formalism — and also illustrate the 
versatility of our approach — by comparing its predictions with simulation results on 
a synthetic, but realistic social network. In section 5, we show that many percolation 
models published in the litterature are special cases of our model. We also highlight 
how our approach can be useful for studying interdependent or coupled networks 
[10, 23, 24, 25], and for studying the weak and strong clustering regimes [26, 27]. We 
conclude in section 6 and present in 2 Appendices some relevant aspects of the analysis 
and simulations. Appendix A details how a recent method, to analytically compute 
the distribution of the composition of components for any small arbitrary graphs [21], 
can be used in our formalism. Appendix B gives further details on the numerical 
simulations. 

2. Correlated and clustered graph ensemble 

We introduce a general class of correlated and clustered random graphs. To preserve 
the analytical tractability of the CM, we first consider unclustered multitype bipartite 
graphs that are locally tree-like in the large system size limit. Clustering is then 
incorporated through a projection, analogous to the one-mode projection of [16]. 

2.1. Unclustered multitype bipartite graph ensemble 

We call unclustered multitype bipartite graphs a multitype generalization of the 
bipartite CM [5]. These graphs are composed of M types of "regular nodes" and 
A types of "group nodes" (hereafter referred to as nodes and groups, respectively). 
Edges only exist between regular nodes and group nodes. In these graphs, a fraction to, 
of nodes are of type i, and any given type-i node is connected to fc M type-/x groups (for 
each fi — 1, . . . , A) with a probability Pi(ki, . . . , k\) = Pi{k). Likewise, a randomly 
chosen type-^ group is connected to rij type-j nodes (for each j = 1, . . . , M) with a 
probability R v (ni, . . . , n M ) = R v (n). In other words, i? l/ (n) is the distribution of the 
composition of type-f groups. Figure 1(a) gives an example of such graphs. To lighten 
the notation, it should now be understood that any free latin (resp. greek) index can 
take any values in {1, ... , M} (resp. {1, . . . , A}), except if otherwise mentionned. 

In the large system size limit, Wi, Pi(k) and R u {n) fully define a graph ensemble 
which is totally random in all other respects (stubs are matched randomly) . All finite 
components therefore have a tree- like structure in this limit (the probability of a closed 
path goes as the inverse of the size of the graph). These quantities are however not 
independent. To guarantee the consistency of the graph ensemble, they must, for all 
applicable combinations of i, j and v, satisfy 



where (o)b denotes the mean value of a with respect to the distribution B. Simply 
stated, (1) asks Wi, Pi(k) and R v {n) to be chosen such that each node type "forces" the 
same number of type-f groups in the unclustered multitype bipartite graph ensemble. 



(1) 




(a) (b) 



Figure 1. (colour online) Illustration of the projection process introduced in 
section 2. (a) In unclustercd multitype bipartite graphs, nodes (29 circles) belong 
to different types (colours, M = 3) and are linked exclusively to groups (13 
squares), which are distinguished through types as well (colours, A = 7). (b) 
In clustered multitype graphs, nodes linked to a same group in the underlying 
unclustercd multitype bipartite graph are linked to one another through a motif 
whose nature and structure are specified by the corresponding group type. Labels 
have been added to nodes and groups for the sake of comparison between (a) and 
(b) and are not part of the model. 

2.2. Clustered multitype graph ensemble 

A clustered graph ensemble is obtained from the unclustered multitype bipartite graph 
ensemble by means of a projection similar to the one-mode projection of [16]. This 
projection is achieved by replacing the group nodes in the unclustercd multitype 
bipartite graphs by motifs involving the nodes that were linked to a same group. 
The nature, either quenched (fixed) or annealed (random), and the structure of these 
motifs is prescribed by the corresponding group type. The resulting graphs then 
consist of different motifs embedded in a tree-like backbone. 

Figure 1(b) illustrates a resulting clustered multitype graph where every group is 
replaced by a multitype quenched motif. For instance, type-green groups (B, D and 
E) are replaced by a triangle composed of one node of each of the M = 3 possible 
types, and whose edges are undirected except for the one between the type-blue node 
and type-red one that is directed. Single edges — directed (C and G) or undirected 
(K, L, and M) — can also correspond to motifs simply composed of two nodes. The 
type of each of the two nodes and the direction of the edge is prescribed by the type of 
the group. An example of the use of annealed motifs, where edges exist with a given 
probablity rather than being a priori fixed, is given in section 4. 

Bond percolation can exactly be solved for the CM and its numerous variants 
because graphs in these ensembles have an underlying tree-like structure. Thus to 
take advantage of the tree-like backbone of the clustered graph ensemble, the outcome 
of bond percolation must be solved beforehand for each motif appearing in the graph 
ensemble. This solution is encoded in Qi U (l\n) giving the probability that I nodes (i.e., 
lj type- j nodes, for all j) will eventually be reached from an initial type-i node by 
following existing edges in a type-is motif of size n. In other words, this distribution 
prescribes the number of nodes (and their type) from which a given motif can be left 
while navigating on a graph of the clustered ensemble. It therefore "restores" the 
tree-like structure of the unclustered multitype bipartite graphs while retaining the 
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effect of the clustered motifs. It is this correspondence that allows the derivation of a 
PGF-formalism which exactly solves the bond percolation properties of the clustered 
multitype graph ensemble. 

In principle, a wide variety of motifs can be incorporated in our model; this 
variety is only limited by our ability to solve the bond percolation outcome on these 
motifs. Motifs can be chosen to reproduce recurring patterns of interactions found 
in real complex networks [22], to account for local clustering in realistic synthetic 
networks (see section 4), or for theoretical investigations (see section 5.5). Following 
the results of [21], we give in Appendix A a general method to calculate Qi V (l\n) for 
most, if not all, imaginable motifs of reasonable size (the limits of the method are 
discussed in [21]). This method can handle quenched (fixed structure) or annealed 
(random structure) motifs in which edges may be directed or not. Also, nodes may 
belong to types which permits to model (dis-)assortative mixing and heterogeneous 
bond percolation [6, 21]. 

3. Bond Percolation Properties 

We now introduce a PGF-based mathematical formalism to calculate the percolation 
properties of the correlated and clustered graph ensemble defined in the last section. 
Since PGF-based percolation formalisms have become fairly standard, the unfamiliar 
reader should consult recent reviews on complex network modeling (see for example 
[28] and references therein) for further details. 

We first define 9i V (x) as the function generating the distributions {Qi U (l\n)} 
of the outcome of bond percolation, from an initial type-i node, on the motifs 
corresponding to type-v groups. As typc-^ groups may not all have the same 
composition (e.g., household size distribution in social networks), 9i V {x) is calculated 
according to 



with Si = (Sii, . . . , Sim) where Su is Kronccker's delta. In (2), we average over "/ T 
instead of over R v (ri) to account for the fact that groups containing rn type-i nodes 
arc ni times more likely to be reached from any type-i node than groups containing 
only one type-i node. Although (2) is not explicitly labelled in this respect, more than 
one motifs may be associated with a given group type. In such case, the distribution 
R v {n) gives the probability of occurence of each motif, and the left-hand sum in (2) 
is taken over each possible motif for which a distinct distribution Qi V {l\n) is obtained 
with the method outlined in Appendix A. 

The function #i„(x) is the mathematical implementation of the correspondence 
between the unclustered and clustered graph ensembles discussed at the end of the last 
section. By generating the distribution of v — >• j edges (i.e., stemming from a type-;/ 
group and leading to a type-j node) reached by a type-? node, one can then navigate 
on a unclustered multitype bipartite graph as if one were on a clustered multitype 
graph. 

We define gi(x) as the PGF generating the distribution of the number of v — > j 
edges emerging from a type-i node (i.e., emerging from the groups a typc-z node is 
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connected to) 

g i (x) = 1 £P i (k)l[[6 iv (x)] kl ' . (3) 

k v 

It is also convenient to define a PGF that generates the distribution of the number of 
v — > j edges emerging from a type-i node which has itself been reached via a [i — > i 
edge 

W*) = Eirr !i II[M*)] (4) 

The averaging term used in (4) is motivated by the same argument as the one in (2). 
With these two PGFs, we may now compute the percolation properties of clustered 
multitype graph ensemble. 



3.1. Phase transition 

As a class of random graphs, clustered multitype graphs undergo a phase transition 
corresponding to the emergence of an extensive connected "giant" component. To 
locate the phase transition, let us define £ V j(s) as the average number of v — > j edges 
at a distance s from any node in any graphs of the ensemble. Due to the tree-like 
structure of the underlying unclustered multitype bipartite graph, each £ v j(s) is a 
linear combination of all £ v j(s— 1) at distance s — 1: 

(5) 

x = l 

where 

is the average number of v — > j edges emerging from a type-i node that has been 
reached via a/i4i edge. In vector notation, (5) becomes 

€(*) = B£( a -l). (7) 

We see from (7) that, in general, every £„j(s) vanishes with increasing s if all 
eigenvalues of the (MA) x (MA) matrix B are below 1. Thus the phase transition 
happens when the largest eigenvalue of B reaches unity %. 



3.2. Giant Component 

As there may be directed edges in the graphs (through the motifs), the giant 
component may have a "bow-tie" structure [5, 6]. This implies that the probability 
V of reaching the giant component may not be equal to its relative size S. Both 
quantities must therefore be computed separately. 

Let us define as the probability that a fi — > i edge does not lead to the giant 
component. Because of the tree-like structure of finite components in the unclustered 
multitype bipartite graph, we see that a^i must satisfy the self-consistency relation 

= / M i(o) • (8) 

X We see from (5) that B is a non-negative and, in general, irreducible matrix. Thus the Perron- 
Frobenius theorem [29] ensures that the largest eigenvalue of B is simple, real and positive. Moreover, 
the associated eigenvector is the only nonnegative eigenvector of B. 
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That is, every edge reached from an edge that is not leading to the giant component 
must not lead to the giant component either. The probability that any type-i node 
does lead to the giant component is therefore given by V{ = 1 — gi(a), and, averaging 
over the node type distribution {wi}, the probability V that a randomly chosen node 
leads to the giant component is 

V = s £ j w i Vi = \- s £ j w i g i (a) . (9) 

i i 

To obtain the size of the giant component, we must calculate the probability that a 
given node cannot be reached from any node in the giant component. This is equivalent 
to computing the probability that this node does not lead to the giant component when 
edges are followed in the reverse direction [5, 6]. Edges in the underlying unclustered 
multitype bipartite graph being undirected, only 9 vi {x) needs to be modified. For 
instance, this can be achieved by using p sr instead of p rs in (A.l). We denote this 
new PGF 6 v i{x) and we will add a bar (~) over every PGF using 9 vi {x) instead of 
B vi {x). 

Following a similar approach as for computing V, we define a M i as the probability 
that a type-i node cannot be reached from the giant component via a fi — > i edge. 
That is, a^i is the probability that a neighbour of a type-i node in a type-/x group is 
not part of the giant component. Self-consistency then requires for a^i to satisfy 

a^i = f»i(a) . (10) 

The probability that any given type-i node is not part of the giant component is 
therefore g~i(a). Considering that a fraction uii of the nodes are of type i, the fraction 
of the graph occupied by type-i nodes in the giant component is 

S, = Wi[l-gi(a)] , (11) 

and the relative size of the giant component is 

i i 



3. 3. Distribution of the composition of small components 

To calculate the distribution of the number of nodes of each type expected in small 
components, we define the PGF A^ix) that generates the distribution of the number 
of edges of each type (i.e., v — \ j for all v and j) that are ahead of a fi — s> i edge 
in small components. In the large system size limit, the small components have a 
tree-like structure and no finite-size effects are to be expected [i.e., the joint degree 
distribution Pj(fc) is constant]. We therefore expect A fli (x) to be invariant under 
translation on a small component; the distribution of the number of each edge type 
ahead, A^x), is independent of the position in a small component. This implies that 
A fJ/i (x) must satisfy 

A lii (x)=x lli f lli (A(x)) (13) 

where the extra x^ accounts for the fj, — > i edge that has just been followed. This 
extra factor guarantees that a finite extent of the distribution generated by A lli (x) can 
be obtained in a finite number of iterations of (13) starting with the initial conditions 
A^i(x) = 1. Replacing x V i — Zi for all v in A^ix) generates the distribution of the 
number of nodes of each type ahead a type-i nodes reached from a type-/i group. 
Thus the composition of a small component reached from a type-i node is generated 
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by 2j<7i(A(z)); again the extra Zi accounts for the initial type-i node. Because any 
node is of type i with probability w il the composition of a small component that is 
reached from a randomly chosen node is therefore generated by 

*(*) = S i-r ' (14) 

i 

where 1 — V ensures the normalization of K(z). Note that ^^(1) is equal to the 
probability that a fi — V i edge leads to a finite (small) component, and is therefore 
equal to a^i. 

Solving (13)-(14) can however become tedious when dealing with large number 
of types of nodes and groups, or large groups. It is therefore worth noting that the 
first moments of the distribution generated by K{z) can be calculated in a more direct 
manner. For instance, let us compute the average number (s^) of type-i nodes in small 
components. With 

dK{z) 



(Si) = 



dzi 



z = l 



inserted in (14), replacing x^i with z,, we get 

_ WjQ-Vj) \ - Wjjk^^a^ de j7 (a) dA ir {\) 

i- V +2^ x_ v Q^ r fo—> ( 15 ) 

where we have used (4), (8) and the fact that gi(a) = 1 — V%. In this last result, 
dgj^ is the average number of type-r nodes that are accessible from a type-j node 

in a type-7 group in small components. Also, ^—g^-^ is the average number of type-z 
nodes ahead of a 7 — > r edge in small components. From (8), we see that this last 
quantity is the solution of 

8A ir {l) ^ y df ir (a)d9 rX (a)dA Xs (l) 

dzi 1T rr 4^ d®r\ dx Xs dz t 

As 

where Qhi&i is the average number of type-A groups to which a type-r node reached 
via a type-7 group is connected in small components. Thus by solving (8)-(9) and 
then (15)— (16), it is possible to obtain quite easily the average number of nodes of 
each type in the small components. Equations for higher moments can be obtained in 
a similar manner and are straightforward to derive. 



4. Illustration and validation 



To illustrate the versatility and the usefulness of our approach, we generated urban 
networks [30] and used our formalism to predict the outcome of an outbreak of a 
hypothetical infectious disease. In these graphs, three (M = 3) types of nodes - 
namely adults (type 1), heath-care workers (HCW, type 2) and children (type 3) - 
interact whithin groups representing households, workplaces, schools and hospitals. 
In addition, friendship bonds between children are modeled using a nontrivial motif 
(shown in figure Bl in Appendix B), and directed edges from adults and children to 
HCW are added to account for the susceptibility of HCW to get infected by people 
seeking care in hospitals [31]. The disease spreads from infectious nodes to their 
neighbours with probability T called the transmissibility [13]. Further details of these 
graphs and of the associated numerical simulations are relegated to Appendix B. It 



Bond percolation on a class of correlated and clustered random graphs 



8 



1 




0.9 - 

0.8 - 

0.7 - 

0.6 - 

0.5 - 

0.4 - 
0.3 

0.2 - 

0.1 : 

- 







0.1 



0.2 



0.3 



0.1 



0.5 



T 



Figure 2. (colour online) Bifurcation diagram of the probability to reach the 
giant component V and the fraction of nodes of type i therein Si/wi as a 
function of the occupation probability of edges (or transmissibility) T. Types 
1, 2 and 3 correspond respectively to adults, HCW and children. Lines represent 
the theoretical predictions of our formalism [(8)— (11)] while symbols have been 
obtained through numerical simulations (over 10 5 simulations on graphs of at 
least 1.2 X 10 5 nodes for each symbol, see Appendix B for further details). The 
percolation threshold T c ~ 0.1 has been obtained by finding the value of T for 
which the largest eigenvalue of B equals 1 [see (7)] . 



should be appreciated that these graphs contain a wide range of properties found 
in real complex networks such as clustering of several orders (e.g., arbitrary motifs, 
heterogeneous Erdos-Rcnyi cliques), (dis)assortative mixing, degree-degree correlation 
and directed edges. 

Figure 2 shows the typical bifurcation diagram of the giant-component-related 
quantities V and {Si}. Apart from the excellent agreement between the results of 
the numerical simulations and the predictions of our formalism, this figure illustrates 
how the multitype approach can highlight the behavioral differences between different 
populations — identified by their own node type — within a same graph ensemble. 
In this specific case, the HCW population has purposely been put in the situation 
where each HCW has more incoming edges than outgoing edges with adults and 
children. Also, the average degree inside the Erdos-Renyi cliques corresponding to 
hospitals (300 nodes connected to one another with probability 0.05) is greater than 
1 for T greater than T = [0.05 x 299]" 1 ~ 0.067. This implies that these cliques 
are increasingly likely to have percolated (i.e., to have a spanning cluster) for T > T' . 
Qualitatively, once an outbreak reaches the HCW population, it is likely to stay mostly 
confined in it and to infect a large proportion of it. Only when T becomes sufficiently 
large does the outbreak invade other part of the population (schools, workplaces and 
friendship circles). These insights are corroborated by figure 2. It also shows that 
although the HCW population accounts for only 5% of the total population, it drives 
the percolation process by pulling down its threshold to T c ~ 0.1; the other node 
types only significantly join (i.e., Si/wi > 0.01) the giant component at T ~ 0.14 and 
T ~ 0.16, respectively. 

Figure 3 shows the distribution of the total number of nodes in small components 
for various values of T. To support our claim that outbreaks are mostly confined 
within the HCW populations, figure 3 also displays the distribution of the number of 
nodes of type 2 in the small components. The small shift between the two curves is 
due to adults and children being infected mostly in households. Again, we conclude 
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Figure 3. (colour online) Distribution of the number of nodes in small 
components for various values of the transmissibility T (one colour per value). 
Continuous and dashed curves represent the total number of nodes and the number 
of type-2 nodes (HCW) in the small components, respectively. Lines were obtained 
by solving (13)— (14) and symbols were obtained through numerical simulations 
(over 10 s simulations on graphs of at least 4.8 X 10 5 nodes for each symbol, see 
Appendix B for further details). 

in an excellent agreement between both the numerical simulations and theoretical 
predictions of the formalism obtained by solving (13)-(14). 

Interestingly, figure 3(a)-figure 3(c) give evidence of what one may call the "local 
percolation" of the hospital cliques as T increases. For T < T', the size distribution 
falls rapidly and monotonously as expected for generic CM graphs [5, 32]. For 
T' < T < T c , however, the shape of the distribution changes as local maxima appear. 
These are due to the growing spanning cluster in the hospital cliques. For T > T c , 
most of the HCW population is part of the giant component, and the spanning cluster 
is more and more likely to cover the entire clique as T increases. The HCW nodes that 
are not part of the giant component are therefore likely to be part of very large small 
components composed of one or more "locally percolated" cliques. This is confirmed 
by the multiple maxima seen on figure 3(b)-3(c). 

5. Special cases, generalization and applications 



We now demonstrate our claims that our formalism encompasses many percolation 
models on random graphs published in the litterature. We also succinctly outline a 
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possible generalization and some straightforward applications of our model. 

5.1. Multitype random graphs 

Our formalism naturally falls back on the model introduced in [6] describing the 
heterogeneous bond percolation on multitype random graphs. In this class of graphs, 
there are M types of nodes, and a i —> j edge is occupied with probability . Type-i 
nodes occupy a fraction Wi of the graph, and a type-? node is connected to kj type-j 
nodes (for each j G [1, M]) with probability Pi(ki, fc 2 , . . . , Um)- 

Our formalism reproduces this model by using one group type for each possible 
(unordered) type of i — > j edge. To each of the A = M(M + l)/2 group types are 
associated the functions 

9 iv (x) = [1 + (x vj - l)T y ] 
9 jv (x) = [l + {x vi -l)T ji ] 

depending whether the edge is considered in the i — > j or in the j — > i direction. 
Along with these functions, Pj(fc) can therefore reproduce the degree distribution 
Pi(h, Afe, • • • , k M ). 

As shown in [6], multitype random graphs naturally encompasses multipartite 
graphs, as well as the undirected random graphs introduced in [5, 13, 16]. By assigning 
nodes with a given degree to a same node type, our formalism can also reproduce 
degree-degree correlation as in [14]. 

5.2. Clustered random graphs 

Being a multitype generalization of the highly clustered random graphs introduced in 
[16], our model simplifies to the latter in a straightforward manner with M = A = 1 
and all groups being Erdos-Renyi cliques. For A = 1, the groups to which any given 
node belongs to is averaged in (2) so that no correlation whasoever can be taken into 
account. 

When considering only M = 1 type of nodes but an arbitrary number of uniquely 
configured groups \R V (n) = 1 for all v\ , we retrieve random graphs containing arbitrary 
distributions of subgraphs as introduced in [9]. The unweighted average (A. 3) plays an 
analogous function as their role distribution, with correlation being taken into account 
by using node types. It is then straightforward to conclude that our formalism also 
encompasses the edge-triangle model introduced in [11, 17] and the strong ties model 
proposed by [19]. 

The ^-theory model [8] can be recovered by considering only M = 1 type of nodes, 
and by allowing nodes to belong to only one group of size larger than two (Erdos-Renyi 
cliques) but to belong to an arbitrary number of group of size two (external edges). 
Also, the random hypergraphs introduced in [7] can be reproduced by our formalism 
by considering M = 3 types of nodes and A = 1 type of groups which are triangles 
composed of one node of each type. 

Finally, a class of formalism [18, 20, 27] uses the multiplicity of edges — the 
number of triangles to which an edge participates — to derive an effective branching 
process and solve the percolation on clustered graphs using PGFs. Although this 
approach tackles percolation from a different perspective, its predictions (i.e., the 
percolation threshold and the size of the giant component) can be reproduced with our 
model by using fully connected motifs of size m + 2 to account for links of multiplicity 
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m and by appropriately using node and group types to account for the correlations 
that this class of models incorporates. 

5.3. Directed random graphs 

Our formalism as presented in this paper can only model directed edges between 
different node types. To describe directed edges among a same node type such as in 
[5, 12], we would need to subdivide the group type corresponding to directed edges 
into an incoming part and an outgoing part (e.g., v — > fimfout), and match the 
complementary parts to form groups in the unclustered multitype bipartite graph. 
In other words, each group is linked to an incoming and an outgoing stub. The 
mathematical formalism introduced in section 3 remains valid except that we would 
need to explicitly consider the fact that nodes are reached by their incoming edges 
and are left by their outgoing edges when writing down the equations (see [5, 12] 
for detailed examples). This adjustment is nevertheless straightforward and does not 
affect the generality of our approach. 

5.4- Interdependent or coupled networks 

The use of node and group types naturally permits our formalism to be used in the 
study of interdependent or coupled networks. In interdependent - or interacting - 
networks, node types could for instance be used to distinguish the elements of two 
or more interacting networks [10, 23, 24]. Different group types would then allow to 
specify precisely the (nontrivial) interactions within and across the networks. In the 
case of coupled, or overlayed, networks [25, 33] elements in a single population interact 
in different ways which is modelled using different edge types. This again can be easily 
achieved with our formalism by defining multiple group types, one for each level of 
interaction. Again, the generality of our approach gives us access to a wide variety of 
complex patterns of interactions in a very detailed fashion. 

5.5. Weak and strong clustering regimes 

The existence of two regimes of clustering, weak and strong, has been put forward 
in [18, 26, 27] with the conclusion that these two regimes have opposite effect on the 
bond percolation threshold. In the weak regime, edges have a multiplicity of either 
or 1 (single edges or disjoint triangles), and the percolation threshold is higher than 
for equivalent unclustered graphs. In the strong regime, edges may contribute to more 
than one triangle, and it is argued that the percolation threshold is then lower than 
for equivalent unclustered graphs. 

Contrariwise, the analysis done in [11, 8] strongly suggests that clustering always 
increases the percolation threshold and that the observed lower percolation threshold 
in the strong regime is due to assortative mixing instead. Hence, according to theses 
results, there should be no weak and strong clustering regimes. The use of node and 
group (or edge) types in our model can generate clustered and unclustered graphs with 
the same correlations (or mixing patterns). It is therefore possible to investigate — 
both numerically and analytically — the effect of clustering alone on the percolation 
threshold, shedding some light on this contradiction while extending the analysis and 
the conclusions of [11, 8]. This will be addressed in a future publication. 
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6. Conclusion 

We have presented a generalization of the Configuration Model allowing for the 
inclusion of several nontrivial mixing patterns and clustering. On the one hand, the 
use of node and group types permits to explicitly prescribe how nodes are connected to 
one another, hence reproducing (dis-)assortative mixing, and indirectly degree-degree 
correlation. On the other hand, the use of a one-mode projection can generate a wide 
range of nontrivial clustered structures through quenched or annealed motifs. Besides 
the modeling of mixing patterns, the multitype approach permits to identify nodes. 
This allows to highlight unusual behaviors or susceptibility of sub-population of nodes, 
as well as to simulate targetted intervention such as attacks, failures, vaccination 
or quarantine. We have also demonstrated that our formalism encompasses several 
models published to this day, and we have outlined potential applications. 

Bridging the gap between empirical network datasets and theoretical models 
is surely one the principal tenets of network theory. Since extracting the effective 
clustered backbone (i.e., motifs) of real networks is still an open problem, our approach 
can only offer a partial answer. However, it provides a comprehensive synthesis of the 
many variants of the CM published to date, and it extends considerably the structural 
complexity of graphs that can be handled theoretically. In these regards, the versatility 
and generality of the present framework could prove useful even beyond the strict 
confines of bond percolation on complex graphs. 
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Appendix A. General method to compute Qi v (l\n) for arbitrary multitype 
motifs 

We present a systematic way to compute the outcome of bond percolation, Qi v (l\n), 
on any arbitrary multitype motifs where edges are simple and can be directed or not. 

Let us first consider a multitype generalization of Erdos-Renyi random graphs. 
These arc composed of n nodes, and a directed edge exists from a type-z node to a 
type-j node with probability pij. Edges exist independently of one another. Note 
that the symmetric case p^ = pji is statistically equivalent to undirected edges. It 
has been shown [21] that Q iu (l\n) can be obtained by iterating 

Qiu(i\n) = Qi„(iion (7 r -st) {1 ~ prs)lr{ns ~ ls) {AA) 

and 

Q iv {l\l) =l-^Q^(m|Z) (A.2) 

m<_l 

from the initial condition Qi v (8i\8i) = 1 with Si = (5a, . . . , 5im)- In essence, knowing 
the probability of finding a component of size I from a node of type i in a graph of size 
i, (A.l) computes the probability of finding a sub-component of size I but in a graph 
of size n (> I). This allows to compute every coefficients of the distribution Qi v (l\n) 
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except for the last one, the one corresponding to the case where the whole graph is 
reachable, which is obtained using (A. 2). 

Let Q be a multitype motif composed of n nodes with an arbitrary configuration 
of edges. The associated distribution Qi v (l\ri) can then be computed by following 
these simple steps: 

(i) Consider an equivalent multitype Erdos-Renyi graph Q' of size n' — Y]j rij 
in which each node belongs to its own unique type (i.e., n'y = 1 for all 
f E {1, . . . , n'}). Note p\,-, the probability for a directed edge to exist from 
the type-i' node to the type-j' one. 

(ii) Compute Q' Vv {l'\n') for Q' with (A.1)-(A.2). Without any loss of generality 
suppose that the initial node from which the graph is probed is of type 1 [i.e., 
(A.1)-(A.2) need to be solved only once]. 

(iii) From Q' iliy (l'\n'), derive the intermediate distribution Q^J{l\ri) of the number of 
nodes of each type that are accessible from the j-th type-i node. This is achieved 
by replacing the artificial node types in Q' by the actual node types in Q, and by 
setting the values of p\,^ according to the configuration of the edges in Q, which 
can include type-dependent probabilities of existence/occupation of edges. 

(iv) Obtain Qi v {l\n) by computing the unweighted average of the rij distributions 
Q\i\l\n) 

Q^(i|n) = l^Q«(/|n) . (A.3) 

Til 

J 

A noteworthy point is that the distribution Q' iu (l'\n') computed for a generic graph 
of size n' can generate every multitype motif of size smaller than n' by appropriate 
choices of v'vj 1 - An explicit example of such a calculation is given in [21]. 

Appendix B. Numerical simulations 

Details of the graphs used in section 4 and of the numerical simulations performed to 
validate our formalism are presented. 

Appendix B.l. Urban networks 

The graphs generated in section 4 were inspired by the urban networks used in 
[30, 31] in which individuals are connected to one another because of their common 
membership to a social group (e.g., households, schools, workplaces, hospitals, 
friendship circles). In this case, the population is divided into three categories - 
identified by node types - namely adults (type 1), health-care workers (HCW, type 2) 
and children (type 3) with {w{\ — {0.45,0.05,0.50}. Every node belongs to one 
household, every HCW belongs to one hospital, every child belongs to one school, 1/9 
of adults belong to one school (teachers, janitors, etc.) and the remaining 8/9 belong 
to one workplace. Also every child belongs to one group of friends (see figure Bl), 
and adults and children are connected at most to two randomly chosen HCW via a 
directed edge. 

Table Bl explicits the group composition distribution R v {n) used to generate the 
urban networks. Except for friendship circles, the connections between individuals 
within groups are modeled with multitype Erdos-Renyi graphs with different 
probabilities of edge existence. In households, every possible edge exists except for the 
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Table Bl. Distribution R v (n) used for the simulations in section 4 with M = 3 
and A = 7. 



Cjroup type 




Composition 


.Probability 






n = (ni, ni, ns) 


R v (n) 








U.UolU 






(.b b uj 


U.Ulou 






(0,2,0) 


0.0010 






(2,0, 1) 


0.1215 






(1,1,1) 


0.0270 


Households 




(0,2,1) 


0.0015 




(2,0,2) 


0.3240 






(1,1,2) 


0.0720 






I'D 9 9"^ 








(2,0,3) 


0.2835 






(1,1,3) 


0.0630 






(0,2,3) 


0.0035 






(5,0,50) 


0.2500 


Schools 




(10, 0, 100) 


0.5000 






(15, 0, 150) 


0.2500 






(10,0,0) 


0.1000 






(20,0,0) 


0.2500 


Workplaces 




(30,0,0) 


0.3000 






(40,0,0) 


0.2500 






(50,0,0) 


0.1000 


Hospitals 




(0,300,0) 


1.0000 


Friendships 




(0,0,5) 


1.0000 


Directed edges (1 — > 


2) 


(1,1,0) 


1.0000 


Directed edges (3 — > 


2) 


(0,1,1) 


1.0000 



directed edges from HCW to adults, HCW and children that exist with probability 0.2, 
0.2 and 0.1, respectively. In schools and workplaces, edges exist with probability 0.01 
and they exist with probability 0.05 in hospitals. The use of relatively large cliques 
with such low probabilities of existence of edges allows to model redundancy in the 
neighbourhood of nodes while keeping a relatively low clustering. Finally, directed 
edges from adults and children to HCW exist with probability 0.5. 

These graphs can be generated in a fairly straightforward manner. For a given 
group type v, we first generate a sequence of groups whose composition is prescribed 
by R v (n). We then generate, according to Pi(k), a list of nodes in which a node 
belonging to fc„ type-i^ groups appears k v times. We finally randomly assign these 
nodes to the groups, and create edges between nodes that are members of a same 
group according to the probabilities given in the last paragraph. 

Appendix B.2. Percolation simulations 

Graphs that were used to obtain the results shown in section 4 were composed of at 
least 1.2 xlO 5 nodes. For T around T c , larger graphs (up to 9.6 xlO 6 nodes) have been 
generated to faciliate the distinction between small components, which are intensive, 
from the giant component, which is extensive. At least 10 3 (10 6 ) graphs were generated 
for each value of T used in figure 2 (figure 3). 
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Figure Bl. Motif used to model friendship bonds between children in the urban 
network used in section 4. 

For each generated graph, 100 percolation simulations were performed. These 
consist in randomly choosing a starting node and then following every possible edges 
leaving this node - and the subsequently encountered nodes - with probability T 
until no new node can be reached. The component size is then simply the number 
of nodes that have been reached. While it would have been straightforward to use a 
type-specific probability T (see section 5.1), we have used a single value to lighten the 
presentation of the results. 
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